Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Multi-Object Tracking using Multi-Channel Part Appearance Representation

Participants : Thi Lan Anh Nguyen, Furqan Muhammad Khan, Farhood Negin, François Brémond.

Keywords: Tracklet fusion, Multi-object tracking, Appearance Representation

Multi-object tracking (MOT) has been one of the fundamental problems in computer vision, essential for lots of applications (e.g home-care, house-care, security systems, etc.). The main objective of MOT is to estimate the states of multiple objects while identifying these objects under appearance and motion variation in time. This problem becomes very challenging due to frequent occlusion by background or other objects, object pose as well as illumination variation, etc.

Depending on the time of data association process, tracking algorithms can be categorized into 2 types: short-term and long-term tracking. Short-term trackers [108], [115] associate object detections in current frame with the most matching object trajectories in the past. These methods are able to perform online processing based on frame-to-frame association and therefore, could be applied in real-time applications. In general, short-term trackers use bipartite matching methods for short-term data association where Hungarian algorithm is the most popular method. Although these methods are computationally inexpensive, object identification could fail due to inaccurate detections (false alarms) and only short-term occlusions can be handled. Long-term trackers [139], [113] can overcome the shortcomings of short-term trackers by extension of the bipartite matching into network flow. The direct acyclic graph in [139] was formed where vertices are object detections or short tracklets and edges are the similarity links between vertices. In [113], the track of a person forms a clique and MOT is formulated as constraint maximum weight clique graph. The data association solutions for these long-term trackers are found through minimum-cost flow algorithm. However, long-term tracking methods also have their obvious drawbacks, such as: their huge computational cost due to iterative association process to generate globally optimized tracks and their pre-requirement for entire object detection in a given video.

Recently, some proposed trackers tried to combine both short-term and long-term tracking methods in a framework to perform online object tracking. The MOT methods in [44], [121] use a frame-to-frame association to generate tracklets followed by a tracklet association process with a time buffer latency. However, their performance is limited by their object features and tracklet representation. These methods utilize basic features (e.g. 2D information, color histogram or constant velocity) applied on whole body parts and use normal Gaussian distribution to describe the object. This way of representation could lose important information to discriminate objects and consequently, could fail to track objects in complex scene conditions ( such as occlusion, low video resolution or insufficient lighting of environment).

On the other hand, multiple-shot person re-identification methods [89], [136] [31] gained high performances in matching objects from different camera views. In order to match a given person in a camera to the closest person in a gallery in another camera, these re-identification methods use efficient features and object representations. These methods are adapted to solve problems that involve pose and camera view setting variation. Since person Re-identification usually deals with identification of a person from different camera views, it is expected that using Re-id representation becomes even more effective in single-view multi-object tracking problem.

Therefore, we propose a robust online multi-object tracking method named MTSTracker which extends object representation and methods proposed for re-identification domain to address problems in MOT. While the re-identification works in offline mode, MTSTracker works in online mode. This method uses a time-window buffer to extract tracklets and associates tracklets in each time-window by using Re-identification techniques. MTSTracker integrates a short-term and long-term trackers in a comprehensive framework. The short-term tracker generates object trajectories called tracklets. Object features are computed for full and body parts, then, each tracklet is represented by a set of multi-modal feature distribution modeled by GMMs. The long-term tracker associates tracklets after mis-detections or occlusions based on learning Mahalanobis distance between GMM components. In order to learn this metric, KISSME [84] algorithm is adopted to learn feature transformations between different scenes by directly learning transformation between probability distributions. Experiments on two public datasets MOT2015 and ParkingLot show that MTSTracker performs well when compared to state-of-the-art tracking algorithms. This contribution has been published in the international conference AVSS 2017 [35].